Skip to content

Conversation

@julian-elastic
Copy link
Contributor

@julian-elastic julian-elastic commented Jun 25, 2025

Fixes _index LIKE <pattern> to always have normal text matching semantics.

Implement a generic ExpressionQuery and ExpressionQueryBuilder that can be serialized to the data node. Then the ExpressionQueryBuilder can build an Automaton using TranslationAware.asLuceneQuery() and execute it in Lucine.

Introduces a breaking change for LIKE on _index fields. The old like behavior is not correct and does not have normal like semantics from ESQL. Customers upgrading from old build to new build might see a regression, where the data changes due to the like filters on clustering produces different results, but the new results are correct.

Behavior for ESQL
New CCS to New => New behavior everywhere
Old CCS to New => Old behavior everywhere (the isForESQL flag is not passed in from old)
New CCS to Old => New behavior for new, old behavior for old (the isForESQL cannot be passed, old does not know about it).
Old CCS to Old => Old behavior everywhere

Closes #129511

@julian-elastic
Copy link
Contributor Author

@nik9000 This is the draft PR

@julian-elastic
Copy link
Contributor Author

julian-elastic commented Jul 3, 2025

I have addressed the 3 NOCOMMITS.
ExpressionBuilder.java - Removed the configuration, it is no longer needed.

Configuration.java - Tried to move the setting from the Configuration to LucenePushDownPredicate
Got it working for LucenePushDownPredicate.from()
Did not get it working for LucenePushDownPredicate.forCanMatch(). I think that is where it is actually needed. How do I get the setting value from DataNodeComputeHandler::startComputeOnDataNodes()?

FilterTests.java
Added the EsqlTestUtils.TEST_CFG so it is no longer null.
Also there was a bug with a newly added test, I fixed it by using an older version so serialization is not available. Please review the fix.

@nik9000 nik9000 added the v9.1.0 label Jul 7, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @julian-elastic, I've updated the changelog YAML for you.

var wildcardQuery = QueryParser.escape(BytesRefs.toString(prefix.fold(FoldContext.small()))) + "*";

return new WildcardQuery(source(), fieldName, wildcardQuery);
return new WildcardQuery(source(), fieldName, wildcardQuery, false, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be false, true? So we force a string match in case folks are running START_WITH against _index?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for ENDS_WITH I suppose

Copy link
Contributor Author

@julian-elastic julian-elastic Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made the change you requested to false, true. Then I tried adding UTs to cover the ENDS_WITH/BEGINS_WITH on index. It seems like ENDS_WITH/BEGINS_WITH is broken on an index, similar to LIKE before the change, as they all used WildcardQuery. It is fixed in some cases with my change, broken with others. If we want comprehensive fix, I can work on it, but it will delay the LIKE PR and it is getting pretty big already.
Do we want to stick with the current broken behavior and handle ENDS_WITH/BEGINS_WITH in a separate PR? I want to get this one checked in first if possible.

To make matters worse QueryParser.escape() in the line right above is the wrong escape sequence for our PR. If you call it on "remote-index" it will escape the dash and change the wildcardQuery to "remote\-index" which is wrong I think and will produce no data. QueryParser.escape() does regular expression escaping, not wildcard escaping. It seems the correct escaping is in WildcardFieldMapper.escapeWildcardSyntax(). But changing it here, might break more cases/cause different data if we CCS to an older server that uses the old WildcardQuery workflow with the new escaping. So it is not trivial to change the false to true. I think keeping it as false should preserve the old wrong behavior, so no regressions and buys us time to investigate properly and handle all the cases. I think we need to discuss...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this and will fork it into an open issue.

@julian-elastic julian-elastic requested review from a team as code owners July 7, 2025 21:27
@github-actions
Copy link
Contributor

github-actions bot commented Jul 7, 2025

@julian-elastic julian-elastic marked this pull request as draft July 7, 2025 21:31
@julian-elastic julian-elastic removed request for a team July 7, 2025 21:32
@nik9000 nik9000 marked this pull request as ready for review July 8, 2025 18:57
var wildcardQuery = QueryParser.escape(BytesRefs.toString(prefix.fold(FoldContext.small()))) + "*";

return new WildcardQuery(source(), fieldName, wildcardQuery);
return new WildcardQuery(source(), fieldName, wildcardQuery, false, false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We talked about this and will fork it into an open issue.

@nik9000
Copy link
Member

nik9000 commented Jul 8, 2025

Let's get this in.

Once it's landed I can explain the backport process. It's even more TransportVersion fun.

@julian-elastic
Copy link
Contributor Author

Merged with #130849

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Analytics/ES|QL AKA ESQL >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.19.0 v9.1.0 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ESQL: Filtering on _index is ignored

3 participants